Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Spam messages recognizing method based on word embedding and convolutional neural network

LAI Wenhui, QIAO Yupeng

Journal of Computer Applications 2018, 38 (9): 2469-2476. DOI: 10.11772/j.issn.1001-9081.2018030643

Abstract （1014）

PDF （1380KB）（785）

Save

It is of great social value and times background significance to filter and recognize spam messages. Traditional artificially designed feature selection methods may lead to data sparseness, insufficient co-occurrence of feature information and difficulty in feature extraction. To solve above problems, a spam messages recognizing method based on word embedding and convolutional neural network was proposed. Firstly, word2vec's skip-gram model was used to train the word embedding of each word in the short message dataset according to the Wiki Chinese corpus, and the two-dimensional feature matrix representing short message was composed of word embedding of each word in a short message. Then, the feature matrix was used as the input to the convolutional neural network. The multi-scale short message features were extracted by using different scale convolution kernels of the convolution layer, and the 1-max pooling strategy was used to obtain the local optimal features. Finally, the fusion feature vector, composed of the local optimal features, was put into the softmax classifier to get the classification results. Experiments were performed on 100000 short messages. The experimental results show that the recognition accuracy based on the convolutional neural network model can reach 99.5%, which is 2.4% to 5.1% higher than that of the traditional machine learning models with the same feature extraction method, and the recognition accuracy of each model maintains above 94%.

Reference | Related Articles | Metrics